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This paper discusses two goodness-of-fit testing problems. The 
-sj , first problem pertains to fitting an error distribution to an assumed 

nonlinear parametric regression model, while the second pertains to 

fitting a parametric regression model when the error distribution is 

pH , unknown. For the first problem the paper contains tests based on 

^0 ' a certain martingale type transform of residual empirical processes. 

(-H , The advantage of this transform is that the corresponding tests are 

" ■ asymptotically distribution free. For the second problem the proposed 

asymptotically distribution free tests are based on innovation martin- 
gale transforms. A Monte Carlo study shows that the simulated level 
of the proposed tests is close to the asymptotic level for moderate 
sample sizes. 

^^ ■ 1. Introduction. This paper is concerned with developing asymptotically 

l/^ . distribution free tests for two testing problems. The first problem pertains 

^D I to testing a goodness-of-fit hypothesis about the error distribution in a class 

^-p ' of nonlinear regression models. The second problem pertains to fitting a 

f— ^ . regression model in the presence of the unknown error distribution. The tests 

are obtained via certain martingale transforms of some residual empirical 
processes for the first problem and partial sum residual empirical processes 

a for the second problem. 

. . ' To be more precise, let be an open subset of the g-dimensional Euclidean 

<*' • space and let {/i(-, ■(?); "i? G &} be a parametric family of functions from MP to 

M. For a pair {X, Y) of a p-dimensional random vector X with distribution 
Vh ' function (d.f.) H and one-dimensional random variable (r.v.) Y with finite 

expectation let 

m{x) := E[Y\X = x], xeW, 
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2 E. V. KHMALADZE AND H. L. KOUL 

denote the regression function of y on X. In the first problem of interest 
one assumes m is a member of a parametric family {//(•, ??);i9 G 0} and one 
observes a sequence {{Xi,Yi), 1 <i <n} such that for some 6 £ Q, the errors 

(1.1) ei{e) = Yi-fi{X„9), l<i<n, 

are independent, identically distributed (i.i.d.) r.v.'s with expected value 0. 
Let F be a specified distribution function with mean and finite Fisher in- 
formation for location, that is, F is absolutely continuous with a.e. derivtive 
/' satisfying 

(1.2) 0< f(jj dF <oo. 

The problem of interest is to test the hypothesis 

Hq: thed.f. of e 1(6) is F, 

against a class of all sequences of local (contiguous) alternatives where the 
error d.f.'s An are such that for some a E L2(M,i^), 

dA„\y^ ^ 1 

1 + TT^^ + '^n' 



dF J 2^/n 

(1.3) fadF = 0, 

njrldF = o{l). 
Occasionally, we will also insist that a satisfy the orthogonality assumption 

(1.4) fajdF = 0. 

In the second problem one is again given independent observations {{Xi,Yi), 
1 < i < n}, such that Yi — m{Xi) are i.i.d. according to some distribution, 
not necessarily known, and one wishes to test the hypothesis 

(1.5) Ho:m{-)= fi{-,6), for some 6" G 0. 

The alternative to Hq of interest here consists of all those sequences of 
functions mn{x) which "locally" deviate from one of fi{x,6), that is, for 
some 6 £ Q and for some function ig £ L2{W, H), 

4_L/i6i, m„(x) = n{x,e) + ^ig{x) +r„e(x), 

(1.6) ^ 

n / rlg{x)dH{x)^0, 
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while the errors Yi — mn{Xi) are still i.i.d. Here ^e{x) is a vector of L2- 
derivatives of fx{x,6) with respect to 6, assumed to exist; see the assumption 
(2.4). 

Both of these testing problems are historically almost as old as the subject 
of statistics itself. The tests based on various residual empirical processes 
for Hq have been discussed in the literature repeatedly. For example, see 
Durbin (1973), Durbin, Knott and Taylor (1975), Loynes (1980), D'Agostino 
and Stephens (1986) and Koul (1992, 2002), among others. Several authors 
have addressed the problem of regression model fitting, that is, testing for 
Hq: see, for example. Cox, Koh, Wahba and Yandell (1988), Eubank and 
Hart (1992, 1993), Eubank and Spiegelman (1990), Hardle and Mammen 
(1993), Koul and Ni (2004), An and Cheng (1991), Stute (1997), Stute, 
Gonzalez Manteiga and Presedo Quindimil (1998), Stute, Thies and Zhu 
(1998) and Stute and Zhu (2002), among others. The last five references 
propose tests based on a certain marked empirical or partial sum processes 
while the former cited references base tests on nonparametric regression es- 
timators. See also the review paper of MacKinnon (1992) for tests based on 
the least square methodology and the monograph of Hart (1997) and refer- 
ences therein for numerous other tests of Hq based on smoothing methods 
in the case p = l. 

However, it is well known that most of these tests are not asymptotically 
distribution free. This is true even for the chi-square type of tests with the 
exception of the modified chi-square statistic studied in Nikulin (1973) in the 
context of empirical processes. It is also well documented in the literature 
that chi-square type tests often have relatively low power against many alter- 
natives of interest, see, for example, Moore (1986). Hence a larger supply of 
asymptotically distribution free (ADF) goodness-of-fit tests with relatively 
good power functions is needed. 

The aim of this paper is to propose a large class of such tests. These will 
be the tests based on statistics of a certain ADF modification and extension 
[see, e.g., (5.3) and (5.4)] of the (weighted) empirical process of residuals 

n 

—00 < y < 00, 

where 7 is a square integrable function with respect to H. The ADF versions 
of the Cramer-von Mises and the Kolmogorov-Smirnov tests will be partic- 
ular cases of such tests. Write Wi for W^ whenever 7 = 1 — see Sections 3.2 
and 5. 

As far as the problem of estimation of 9 is concerned, certain weighted 
residual empirical processes play an indispensable role [cf. Koul (1992, 1996)]. 
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A part of the objective of the present paper is to clarify the role of these 
processes with regard to the above goodness-of-fit testing problem. 

To begin with, we shall discuss the basic structure of the first problem 
from a geometric perspective. This perspective was explored in the context 
of empirical processes in Khmaladze (1979). We shall show that under Hq the 
asymptotic distribution of W^, and its general function-parametric form S.nil, '■P\ ^) 
[see (2.2)], is equivalent to that of the projection of (function-parametric) 
Brownian motion parallel to the tensor product /ig • (/'//)• Since a "pro- 
jection" is typically "smaller" than the original process we can intuitively 
understand why, at least for alternatives (1.3), it will lead to increase in 
asymptotic power if we substitute an estimator 6 even in the problems where 
the true value of the parameter is known. The distribution of this projection 
depends not only on the family of regression functions {/i(-,'i?);'!9 G 0} and 
F, but also on the estimator 9. Therefore, the limit distribution of any fixed 
statistic based on W^ or on ,^„(7,c/9;i?) will be very much model-dependent. 
However, using this "projection" point of view, we shall show in Section 3.2 
that the tests based on W^ corresponding to a certain nonconstant 7 may 
be useful, because they may have simpler asymptotic behavior, butjit the 
cost of some loss of the asymptotic power, and the tests based on Wi, in 
general, will have higher asymptotic power. 

But, as mentioned above, the asymptotic null distribution of Wi is model 
dependent. Proposed martingale transforms of W-^{F~^) will be shown to 
converge in distribution to a standard Brownian motion on [0, 1] under ifo, 
and hence tests based on these transforms will be ADF for testing Hq. It 
will also be shown that for any 7 this transform is one-to-one and therefore 
there is no loss of the asymptotic power associated with it. 

The paper also provides ADF tests for the problem of testing 

H„ : the d.f. of ei{e) is F{y/a), Vy G M, for some a>0. 

In the univariate design case, ADF tests for Hq based on certain partial 
sum processes and using ideas of Khmaladze (1981) have been discussed 
by Stute, Thies and Zhu (1998). An extension of this methodology to the 
general case of a higher dimensional design is far from trivial. The second 
important goal of this paper is to provide this extension. Here too we first 
discuss this problem from a general geometric perspective. It turns out that 
the weighted partial sum processes that are natural to this problem are 

UB;9) ■.= n~^/^Y.^{X, e BMY^ - fi{Xi,e)), 

for a fixed real valued function ip with Eip'^{e) finite, where S is a Borel 
measurable set in W. Tests based on these processes and the innovation 
martingale transform ideas of Khmaladze (1993) [see, e.g., (6.4)] are shown 
to be ADF, that is, their asymptotic null distribution is free of the model 
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fj,{-,0) and the error distribution, but depends on the design distribution in 
the case p> 1. These tests include those proposed in Stute, Thies and Zhu 
(1998), where p = 1, ^{y) = y,B = (— oo,x], x S M. 

We mention that recently Stute and Zhu (2002) used the innovation ap- 
proach of Khmaladze (1981) to derive ADF tests in a special case of the 
higher dimension design where the design vector appears in the null para- 
metric regression function only in a linear form, for example, as in gen- 
eralized linear models, and where the sets B in S,niB;6) are taken to be 
half spaces. This again reduces the technical nature of the problem to the 
univariate case. 

In another recent paper Koenker and Xiao (2002) studied tests based on 
the transformations of a different process — regression quantile process to 
test the hypothesis that the effect of the covariate vector X on the location 
and/or on the location-scale of the conditional quantiles of Y, given X, is 
linear in X . They then used the Khmaladze approach to make these tests 
ADF. Based on several Monte Carlo experiments, Koenker and Xiao (2001) 
report that their tests have accurate size and respectable power. 

The paper is organized as follows. Section 2 introduces some basic pro- 
cesses that are used to construct tests of the above hypotheses. It also dis- 
cusses some asymptotics under Hq of these processes. Section 3 discusses 
some geometric implications of the asymptotics of Section 2, while Sec- 
tion 4 gives the martingale transforms of these processes whose asymptotic 
distribution under Hq is known and free from F. Section 5 contains some 
computational formulas of these transformed processes. It also provides ana- 
logues of these ADF tests for nonrandom designs and when the underlying 
observations form a stationary autoregressive process. Section 6 contains the 
ADF processes for testing Hq. Section 7 contains some simulation results to 
show how well the asymptotic level approximates the finite sample level for 
the proposed ADF tests. It is observed that even for the sample size 40, this 
approximation is quite good for the chosen simulation study. See Section 7 
for details. 

2. Function-parametric regression processes with estimated parameter. 

2.1. Function-parametric regression process. Consider a regression pro- 
cess as is defined in Stute (1997): 

n 

UB,y,^) ■■=n~'/^J2^{Xi € B}[I{ei{^) <y} - F{y)], 

(2.1) '=' 

—CO < y < oo,-!? € 0, 

where B isa Borel measurable set in the p-dimensional Borel space (M^, ;S(]R^)) 
and 

e^{'&):=Y,-fi{Xi,^), l<i<n. 
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We will use also notation IsiXi) for the indicator function I{Xi G B} in- 
terchangeably. It is natural to consider an extension of the above process 
where the indicator weights are replaced by some weight function 7(Xj). 
The function 7 may be scalar- or vector-valued. The weak convergence of 
such processes in the y variable and for a fixed 7 has been developed in Koul 
(1992, 1996) and Koul and Ossiander (1994). 

It is not any less natural to consider an extension of these weighted empiri- 
cals to those processes where the second indicator involving the error random 
variable £i{'d) in (2.1) is also replaced by a function. Consider, therefore, a 
function-parametric version of (2.1) indexed by a pair of functions {'y,^p)- 

in{l,^]'&)-= / -f{x)ip{y)^n{dx,dy;^) 

(2.2) " r /■ 

-1/2^7(^0 ^{e^m-j^{y)dF{y) 



■ n 

4 = 1 



We shall choose 7 G L2(MP,if) and ip G L2(M, F). In this way one can say 
that ^n is defined for the function a{x,y) = 'y{x)ip{y), which is an element 
of L := L2{W^^ , H X F). For a general a G L we certainly have 



^„(a;i9):=/ a{x,y)^nidx,dy;i}) 

'~p+i 

n 

'i/2^(a(X,,ei(^))-ii;[a(X,,ei(^))|Xi]). 



n 

i=l 



We will realize, however, that it is sufficient and natural for our present 
purpose to restrict a to be of the above product type. In the sequel, for any 
functional 5 on L we will use the notation S{a) or 5(7, (p) interchangeably, 
whenever a = 7 • y?. 

The processes defined at (2.1) and (2.2) are obviously closely related: (2.1) 
represents a regression process as a random measure on W~^^ while (2.2) 
represents it as an integral from this random measure. Also, (2.2) defines a 
linear functional on L. 

The function-parametric version (2.2) will help to visualize in a natural 
way the geometric picture of what is involved when we estimate parameters 
and show why and when we need "martingale transformations" (Sections 4 
and 6) to obtain asymptotically distribution free tests. 

2.2. Asymptotic increments of ^n with respect to parameter. Since 6 is 
unknown, in order to base tests of Hq on the process ^n we will need to 
replace it by an estimator 9 in this process. This estimator will be typically 
assumed to be n-^'^-consistent, that is, 



(2.3) \\e-e\\=Op{n^ 



-1/2N 



(2.4) 
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There is thus a need to understand the behavior of ^n{oi.;9 + n~'^''^v) as a 
process in uS]R'5',||i;|| </c<oo. The first thing certainly is to consider the 
Taylor expansion of this function in v. 

To do this assume the following L2-differentiability condition of the re- 
gression function fi{x,'d) with respect to 'd: there exists a q x 1 vector fig of 
functions from M^ x to M'^, such that 

fi{x,{}) - fi{x,e) = fi];{x){^ - 9) + pf,{x;^,e), 
< fig {x)fi0{x) dH{x) < oo, 
Cq := / fjLg{x)fig {x)dH{x) is positive definite, 
/ sup p {x]d,6)dH{x) = o{e ), as e — > 0. 

J \\-d~e\\<e 

Here, and in the sequel, for any Euclidean vector v, v^ denotes its transpose. 
Now, if additionally 93 is differentiable with derivative ip' E L2(M, F) sat- 
isfying 

(2.5) hm/ sup |(^'(y-A)-99'(y)|2(iF(y) = 0, 

then, with a{x,y) = 'y{x)(p{y), we have the following proposition. 

Proposition 2.1. Under assumptions (2.4) and (2.5), the following 
holds for every < k < oo. 



(i) Forany-f£L2{W,H) 



'1/2, 



sup |^„,(a;6l + n / v)-S,n{a;d) 

\\v\\<k 

-EjiX,)pJiX)Eip\e)v\=Op{l). 
(ii) For -f = rjlB, B£ B{W) and a fixed rj £ L2(MP, H), 



sup 

B(^B,\\v\\<k 



U»;9 + n~^/^v)-Ua;9) 

--Op{l). 



n 
_1 ■ 

n 

i=l 



i=l 

Hence, under (2.3) one obtains 

n 

(2.6) Ua;9)=Ua-9) - n-^Y.V,lB{X,)fiJ{X,)^'iei)n'/\9 - 9) + pniB), 

i=l 

where Pn{B) is a sequence of stochastic processes indexed by B £B, tending 
to zero uniformly in B ^B in probability. 
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The representation in (i) or in (2.6) will be very convenient and appropriate 
when dealing with the fitting of a regression model in Section 6. But for 
testing Hq pertaining to the error distribution, as we will see in the next 
section, the differentiability of (p is restrictive. We may wish, for example, 
to choose if to be an indicator function as in (2.1). Thus it is desirable to 
obtain an analog of the above proposition for as general a (p as possible. 

Towards this goal, let $ denote the linear span of a class of nondecreasing 
real valued functions ^{y), y G M, such that 

if\y)dF{y)< CO, 

1/2 



(2.7) (^J[ipiy -t)- p{y - s)f dF{y)^ < z.(|t - s\) 



-e< s,t < e, 



for some e > and for some continuous function u from [0,oo) to [0,oo), 
with 1^(0) = 0, Jq log i^~^{t) dt < oo. This is a wide class of functions and will 
be a source of our ip in what follows. 
For any two functions a, /3 G L, let 



(a, /3) := / a{x, y)/3(x, y) dH{x) dF{y). 



Note that if a or both a,j3 are vector functions, then {a,P) or (a,/3 ) is a 
vector or a matrix of coordinate- wise inner products. Let ||a|| := {a ,a)^''^ 
for a vector function a. Finally, let 

, , , f'(.y) 

me{x,y):=fie{x)Tpf{y), x£W,yeR. 
Note that 

{me,m];) = CeUff. 
We are ready to state 

Proposition 2.2. Suppose that (1.2) and (2.4) hold. Then for a{x , y) = 
-f{x)(p{y) with 7 G L2{W,H), (^5 G $, 

(2.8) e„(a; 9) = e„(a; 6) + (a, mJ^/^O -9) + Op(l). 

To appreciate some implications of (2.8) we need to consider those estima- 
tors of 9 that admit an asymptotic linear representation. For the purpose of 
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the present paper it would be enough to assume this. However, for complete- 
ness of the presentation we give a relatively broad set of sufficient conditions 
under which a class of M-estimators is asymptotically linear. Let {?/^, ?? G 0} 
be a family of (7-dimensional functions on MP with coordinates in L2{W, H). 
Let P^ := r]^ ■ (p, 99 G <&. Define an M-estimator ^ to be a solution of the 
equation 

(2.9) en(/3^;^) = o. 

The following proposition gives a set of sufficient conditions for this estima- 
tor to be asymptotically linear. 

Proposition 2.3. Suppose that (1.2) and (2.4) hold. In addition, sup- 
pose (p £ ^ and {i]^,i!) G Q} are such that 

(2.10) / sup \\r].(,-rjefdH = o{l), ase^O, 

and the matrix {(ie^rn^) is nonsingular. Then 9 defined at (2.9) satisfies 

(2.11) n'l\9 -6) = -{(3e,mJ)-^U(^e;0) + Op(l). 

In particular, if {fi^]'d G 0} satisfies (2.10), then the solution 9 of the like- 
lihood equation 

(2.12) e„(m^;i?) = 
has the asymptotic linear representation 

(2.13) n^'\9 -9) = -{me,mJ)-^Ume;9) + Op(l). 

From now on 9 will stand for the solution of (2.12), and we shall use the 
abbreviated notation ^n(a) = ?n(«; d), ^n(«) = ^n(a; &) and ^n(«) = Cnict] 9). 
Combining (2.8) with (2.11) and (2.13), we see that the leading term of ^„ 
and of ^m in general, can be represented as the linear transformation of ^n'- 

(2.14) |„(a) =Cn(a) - (a,mj)(me,m^)"^^„(me) +Op(l), 

(2.15) |„(a) =en(a) - (a,mj)(/?e,mj)-^en(/?e) + Op(l). 

These linear transformations have a remarkably simple and convenient struc- 
ture as is described in Section 2.3. 
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2.3. Processes S,n o,nd S,n o,s projections. Let us use the notation 1 for 
the function in y identically equal to 1, so that, for example, {(p,l) = 
J (p{y) dF{y) and let if^ = ip — {tp, 1), and for q = 7 • (^ let a^ = j ■ ip^ . It 
is obvious that Cn(a) = Cn(a^)- 

For a G L and a vector-valued function /?, with coordinates in L, such that 
the matrix (/?, mj ) is nonsingular (we require this for simplicity, although 
it is not necessary), let 

(2.16) Ila = a — {a,m0){mo,mQ)~ mo, 

(2.17) Iipa = a-{a,m^){(3,m^)~^[i. 

Proposition 2.4. (i) The linear transformation a^^a^ is an orthogo- 
nal projection in L parallel to functions which are constant in y. 

(ii) The linear transformation 11^^ (and therefore U) is a projection. 
It projects parallel to [3q on a suhspace of functions orthogonal to mg . In 
particular II is an orthogonal projection parallel to mg. 

(iii) Adjoint projectors Ho (and therefore U* ) project parallel to mg. For 
any two vector functions /?, A, 

(2.18) n^n^ = n^. 

We can therefore say that under the regularity conditions that guarantee 
the validity of the expansions at (2.8), (2.11) and (2.13), the substitution 
of the M-estimator 9 in ^„(a;^) for 6 is asymptotically equivalent to pro- 
jecting ^„(a;0) parallel to the linear functional mg generated by fig and ipf. 
Similarly, the substitution of the MLE 6 in ^„(q;0) for 6 is asymptotically 
equivalent to projecting ^^(a;^) orthogonal to mg. Moreover, the property 

(2.18) shows that the leading terms of ^n(a; ^1) and ^n(a; ^2), for any two es- 
timators 9i, 02 admitting the asymptotic linear representation (2.11), are in 
one-to-one correspondence with each other. Even though one of the estima- 
tors may be asymptotically more efficient than the other, (2.18) shows that 
the stocks of test statistics based on each of these processes are asymptoti- 
cally the same. Therefore the inference based on either £,n{c(',di) or ^^(05^2) 
will be asymptotically indistinguishable. 

We end this section by outlining the proofs for Propositions 2.2 and 2.4. 
Throughout, e^ stands for ei{6), l<i<n. 

2.4. Some proofs. 

Sketch of the proof of Proposition 2.1. We shah sketch details 
only for part (ii), while those for part (i) are similar and simpler. Let Ai{v) = 
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fi{Xi,9 + n~^/^v)- fj,{Xi,e). Rewrite 

n 



11 



1=1 



i=l 

n 



1=1 



The condition (2.4) implies that for every e > 0, 3 A''^ < oo such that with 
probability at least 1 — e the following holds for all n> N^: 



e\ sup Y.\A,{v)-n-^/^fiJ{Xi)v\^\^0, 

[\M<ki=i J 



sup |Ai(t;)| =Op(l). 

l<i<n;||i)||<fc 



This fact and (2.5) imply the conclusion (ii) in a routine fashion. D 

Before proving the next proposition, we recall from Hajek (1972) that 
(1.2) implies the mean-square differentiability of /^'^: 

7i72(^^ = ^jiy)^ + pf(y^^)^ 

p){y-6)dF{y) = o{5^), 5^0. 

This fact is used implicitly in the following proof and throughout the dis- 
cussion in the paper without mentioning it explicitly. 

Proof of Proposition 2.2. RecaU a{x,y) =j{x)(p{y). Rewrite Cn = 
Uo + ^n^ where 

n 

Cnoia;^):=n-^/^Y.^{X,)[ip{e^m-Eg[ip{eiimXi]\, 

i=l 
n 

Cia;^) ■.= n-'/^J2^iX,)[Eg[^ie,{mx^]-Ee[^ie^m] • 

i=l 

Note that ^noioi;9) =(,n{a;6). 

To prove Proposition 2.2 it thus suffices to show that for every < k < oo, 

(2.19) sup \^no{a;e + n-^/^v)-U{a;e)\=Op{l), 

\\v\\<k 

(2.20) sup \Cn{a;e + n-^/'^v)-m^{a;6)n-^/^u\=Op{l). 

\\v\\<k 
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But (2.19) will follow from the equicontinuity condition of the process 

Cno(a;-): 

sup \Cno{a;'&)-^noia;9)\=Op{l), 

\\^-e\\<e 

as n — > oo and e ^ 0. This in turn follows from the argument below. 

A (p £ ^ may be written as (p = (pi — (p2, where nondecreasing (pi, ip2 both 
satisfy (2.7). Let /j := sign(7(Xj)), i = 1, . . . ,n. Then for any 6 > and for 
alH = 1, . . . ,n, 

7(X,)[(^i(y, - A - dli) -ip2iYi-A + 6Ii)] 

< j{X,MYi - A) 

< -fix^)[ipl{y^ -^+sii) - MYi-^- ^W]- 

The expected value of the square of the above upper and lower bounds is 
bounded from above by 



f-f^dH2i^^{26). 



logz^"M i 



Therefore the bracketing entropy (log of covering number) does not exceed 

r r i^/^N 

2 hy^dH 

and hence is integrable by the definition of z^. Therefore, by a result in van 
der Vaart and Wellner (1996, Sections 2.5.2, 2.7), (2.19) follows. 

To prove (2.20), let, as above, Ai{v) = fi{Xi,e + n~^/'^v) - fi{Xi,9). Then 
one has 



:n-V2 



J2liX^) U{y)[f{y + A,{v))-f{y)]dy 

i=l •' 

'■ r 

^Y.^{Xi)fi^{Xi) / ^{y)^f{y)dF{y)v + pr,{v) 



1=1 

n 
— 1 ■ 
: n 

■■{a,mj)v + pl{v), 



where under the assumed conditions and using an argument similar to one 
used, for example, in Hajek and Sidak (1967) one can show that sup|u|j<fc \Pn{v) 

Op{l). 

D 

Proof of Proposition 2.4. Let us prove part (iii) only. We need to 
show that U*^UlS{a)=U*pS{a). We have 
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But, by definition, 

n^(a,mj) = {a,m^) - {a,m^){(3,m^)~^{(3,m]l) =0. 

Hence the last claim. It implies that n^n^ = 11^, that is, 11^ is a projection. 
The remainder of the proof is obvious. D 

3. Limiting process and asymptotic power. 

3.1. The limiting process. Let b{x,y),x G W,y G M, be a Brownian mo- 
tion with covariance function H{x A x')F{y A y'), where x Ax' is the vector 
with coordinates min(xj, x'J, ^ = 1, . . . ,p. In the discussion below all 7's and ^j's 
are in L2(ffi^,-ff) and L2(M.,F), respectively, that is, {'y,^p) G L. Define, for 
a{x,y) =^{x)(p{y), the function parametric Brownian motion 

b{a):=b{j,ip):= / -f{x)<p{y)b{dx,dy). 

Clearly the class {b{a) : a G L} is a family of zero mean Gaussian random 
variables with the covariance given by 

Eb{ai)b{a2) = (01,02)- 

Let 

e(a) := 6(7, if) - {if, 1)6(7, 1) = %, ^') = %')• 

The family {^{a) : a G L} is also a family of zero mean Gaussian random 
variables with the covariance 

E^{ai)^{a2) = {ji,"f2)[{vi,'P2) - ((/?!, l)(y?2,l)] = {a\,al). 

Thus, ^(a) is a function parametric Kiefer process in a and simply a Brow- 
nian motion in a^. Finally, define 

i{a) := ^(a) - {a, mj) {mg,mj)~'^^{me) = n^(a). 

Since {4'f,l) = J f'{y)dy = 0, we have ^{mg) = bimg). Hence, ^ can be 
rewritten as 

(3.1) i{a) = b{a^) - {a\mj){mg,mj)~^b{mg) = Ub{a^) = b(na^). 

It seems easier to use below the notation a± for Ha^: 

a± = a —{a ,'mg){mg,mg)~ nig, 

which is the part of a orthogonal to 1 and mg . 

Here and everywhere below we will consider only the case of orthogonal 
projectors, which asymptotically correspond to the substitution of the MLE. 
As our comment after Proposition 2.4 shows, we can do this without loss of 
generality. 

In view of (2.14), the reason for introducing the processes ^ and ^ is clear 
and is given by the following statement. 
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Proposition 3.1. Suppose that the conclusion (2.14) holds. Then the 
following holds for every a G L; 
Under Hq 

(3.2) ^„(a)4e(a), |„(a)4e(a). 

Under the alternatives (1.3) 

in{oi)^i{oi) + {a,a), 

ln(a)^l(a) + (a,o) - {a,mj){me,m]))~^{mg,a). 

Because both ^„ and ^„ are linear in a, the above proposition is equiva- 
lent to the weak convergence of any finite-dimensional distributions of these 
processes. Hence the possible weak limits of these processes are uniquely 
determined. 

Prom (3.2) and (3.3) we see that the asymptotic shift of ^„ under the 
alternatives (1.3) and (1.4) is simply {a, a), if a-Lmo, that is, if either 
7 _L /ie or (^ _L V/ ■ 

3.2. The case o/ 7 _L /i^. In this case there exists an optimal choice of 
7 which will maximize the asymptotic "signal to noise" ratio A of S,ni'l,'^) 
uniformly in a, that is, uniformly in alternatives (1.3), where 

,^_Ka,a)|_ 1(7,1)1 Ky,a)| _ 
||a|| II7II lly?!! 

Here, too, we use the notation 1 for the function in x identically equal to 
1. Clearly, the 7 that maximizes A, uniformly in a, is the 7 that maximizes 
the ratio 

(3.4) l<^''>l 



subject to the condition that 7 _L /ig, and is given by 

1± := 1 - I fiJdHCg^fie = 1 - iiij, l)C,-V"*e. 

On the other hand, the 7 that maximizes A or (3.4) among all 7 G L2{W, H) 
is 1. Then 1_l is simply part of the identity function 1 orthogonal to fiQ. It 
thus follows that 1_l(x) = when fj,{x,'d) is linear in ■!? and has a nonzero 
intercept. 

Now consider ^(1_l,c^) as a process in 99, assuming that ||1_l|| 7^ 0. Since 
1± ■ if is orthogonal to fig, from (3.1) we obtain 
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It thus follows that S^{l±,(p) is a Brownian bridge in (p. If, for example, 
we choose ip{y) = <Pt{y) = I(y ^ -^^^(0)) < t < 1, then along the family of 
functions {ipt{-).,Q < i < 1}, the process 



V J- J- / 



is a standard Brownian bridge with Eu{s)u{t) = s At — st. 
A prelimiting form of the process u is 



Un{t) = ini ii-, "^'"i ,^Pt] 
\\\^±,n\\n / 



-1/2 y^ ^±,n{Xi) 



J- I T) T). 



i=l 



l±,n{x) ■= [1 - (AJ' l)nC'^ ^Ael^)]' ^ e 



PP 



where 



||l±,n||n := (1 - (Aj,l)nCr^(Ae, l)n) 



1/2 



and where //„ is the empirical d.f. of the design variables {Xi, 1 < i <n}. 
One can verify, using, for example, the results from Koul (1996), that under 
the present setup, Un converges weakly to a Brownian bridge. Hence, for 
instance, tests based on 



sup|'u„(t)| or / \Un{t)\ 
t Jo 



^dt 



will have asymptotically the well-known Kolmogorov and Cramer-von Mises 
distributions, respectively. 

Now, suppose that the design d.f. H and the regression function fig are 
such that 

(3.5) {fieA) = Jf^edH = 0. 

Then 1_|_ = 1, and n„ = Wi{F~^), the ordinary empirical process of the resid- 
uals whose weak convergence to Brownian bridge can also be derived from 
Koul (1996) under the present setup. 

There is, however, a drawback in the choice of 7 _L /ig: although, as we 
see, this choice of 7 makes the asymptotic behavior of S,n in f simple, the 
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tests based on the process ^n(||l±||~"'^l±, v?) will in general have some loss of 
asymptotic power. Consider for the moment the problem of testing Hq vs. 
the alternative (1.3) for given a when 9 is known. Then the shift function 
that will appear in the asymptotic power for Cni'^,f) is |(7, l)(v7, a)|/||7||||(/3||. 
This will attain its maximum in 7 when 7=1. However, for the process 
^„(1_L,99) the corresponding shift is uniformly smaller in absolute value: 



1 



-{tp,a) 



<\{ip,a)\ 



and, in particular, the statistic Cn(l±)0) will have smaller asymptotic power 
against the alternative a than the statistic ^^(Ijo)- The actual loss may be 
quite small, depending on the quantity 

and may actually equal 0, if (3.5) holds. But, in general, there is some loss. 

We shall see in Section 6 that the choice of 7 _L /ig will become most 
natural when fitting a regression model. However, one should not think that 
the loss of power associated with this choice in testing the hypothesis Hq 
is unavoidable due to the estimation of the nuisance parameters. On the 
contrary, estimation of the parameter may lead to an increase of power 
against "most" alternatives. We will see this better in the next section. 

Finally, we remark that the geometric picture, similar to the one depicted 
by Propositions 2.4 and 3.1 and also in this and the next sections, was 
developed in the context of the parametric empirical processes in Khmal- 
adze (1979). See also the monograph by Bickel, Klaassen, Ritov and Wellner 
(1998) describing the related geometry in connection with efficient and adap- 
tive estimation in semiparametric models. 

3.3. The case of if }- ip f . This case is important for two reasons. The first 
is that in this case again ^(7,9?) ='^(7, V?), that is, the asymptotic behavior 
of the processes ^n(") and ^n(«) under Hq is the same. The second is that if 
we assume that a of (1.3) also satisfies (1.4), then there is in general a gain 
in the signal to noise ratio if we choose 99 orthogonal to tpf. Indeed, let ip±_ 
denote the part of (p orthogonal to V/ and 1. The signal to noise ratio for 

inil-,^1.) is asymptotically larger than that for inil-,'^)-, as is seen from the 
following elementary argument: 

(7,1) (v?,q) ^ (7,1) (¥^±,q) ^^ (7>1) (y^±,a) 

II7II \W^ h\\ llvMI ~ II7II lb±ll 

because [[(/j-*-!! > Ht^j.!!. 

It is also obvious that the optimal choice of 7 that maximizes A uniformly 
in a is 7 = 1. Therefore, consider the process 

(3.6) i{l,^)=ai,^) = h{l,^) 
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as a process in ip, for y? satisfying (p }-ipf and (/^ _L 1. From (3.6) it is clear 
that if we had a family of functions {(/5i,0 < t < 1} from L2{^,F) such that 

(3.7) ((^,,l) = (y,,,V;;) = 0, 

(3.8) {ipu^t)=t, 0<t<l, 

(3.9) (c^i2 -c^i,,c^i,) =0, t2>ii, 

then the process ^(1, (/9(), < t < 1, would be a Brownian motion in < t < 1. 
Hence, all tests based on 

n 

n^^/^Y.^t{em), 0<t<l, 
1=1 
win be ADF. 

It is straightforward to construct a family of functions satisfying (3.8) and 
(3.9). For example, take any function (p from L2i^,F) such that L{y) : = 
J^^ ip"^ dF is a continuous distribution function on M, and p^ f > 0, a.e. 
Then the family 

Pt{y):=v{y)i{y<L-\t)}, 

(3.10) 

L-^(t) :=inf{yeM:L(y)>t}, < t < 1, 

satisfies these conditions. However, finding a family {pt^^ < i < 1} that sat- 
isfies (3.7) as well becomes far less straightforward. It is here we will exploit 
the "martingale transform" ideas of Khmaladze (1981, 1993). 

4. A martingale transform. Let h{y) := (1, ijjfiy)) be an extended score 
function of the error distribution and set 

r ^ (^-F{y) -fiy) 

r, := J^^^ h{z)h (z) dF{z) = 1^ _^(^^ J- ^2(^) ^^(^) 

t = F{y). 

The matrix Fj will be assumed to be nonsingular for every < t < 1. This, 
indeed, is true if and only if 1 and ipf{y) are linearly independent on the set 
y>c for all sufficiently large c. This, in turn, is true if V'/ is not a constant 
in the right tail of the support of /. Then the unique inverse F^ exists 
for every < t < 1. [The case when Ff is not uniquely invertible does not 
create, however, much of a problem for the transformation (4.1), as is shown 
in TsigroshviU (1998).] 

Now, observe that the condition (3.7) above is equivalent to requiring that p be 
orthogonal to the vector h. For a function ip £ L2(M,-F), consider the trans- 
formation 

(4.1) Cp{y) := p{y) - j p,{z)h^ {z)T-\.dF{z)h{y), y G M. 

•'z<y 
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Let, for a (7,(/7) G L, 

We have the fohowing: 

Proposition 4.1. Let H := {^p e L2(M,F) : (99, /i) = 0}. T/ie transfor- 
mation C of (4.1) is a norm preserving transformation from L2(M, F) to 
H: 

Cip±h, ||/:v7|| = ||c/5||. 

Consequently the process w{a) is a (function parametric) Brownian motion 

on L. 

A consequence of this proposition is the following corollary: 

Corollary 4.1. Suppose {(pt,0 <t<l} is a family of functions sat- 
isfying the conditions (3.8) and (3.9). Then {C<pt,0 < t < 1} is a fam- 
ily of functions satisfying all three conditions (3.7)-(3.9). Consequently, 
{^(7,/^(^t))0 <t <1}, for any fixed 7 with \\^\\ = 1, is a standard Brown- 
ian motion in t. 

Now, if {^n{l, ^ft),0 < t < 1} converges weakly to {^(7, Cipt),0 < t < 1}, 
then tests based on any continuous functionals of i^„(7, £(/?() will be ADF 
for testing Hq. Some general sufficient conditions for the weak convergence 
of {tni'yi^t),0 <t <1} can be drawn from Proposition 6.2. Others can be 
inferred from, for example, van der Vaart and Wellner (1996). In particular, 
these claims hold for the family {(pt,0 < i < 1} given at (3.10). 

It is also important to note that the transformation C is free from 7 and, 
hence, the statement concerning the asymptotic distribution of {S,n{'y, 'C(pt),0 < 
t < 1} is valid for any 7 G L2{W, H). 

Another consequence of Proposition 4.1 is worth formulating separately. 

Proposition 4.2. Let 9 he any estimator which satisfies (2.3) [and does 
not necessarily have a linear representation (2.11)] and let ipf be a function 
of bounded variation. If additionally, (1.2) and (2.4) hold, then for every 
a = 7 • c/5 with 7 E L2{W, H) and ip G $, under Hq, 

|„,(7,£(/9)^u;(a), 
while under alternatives (1.3), 

|„(7,£(^)^w(a) + {Ca,a). 
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This proposition shows that although we used asymptoticahy hnear repre- 
sentations (2.11) and (2.13) of 6 and 6 to develop the previous theory, for the 
asymptotic behavior of the transformed processes ^n{l,C,'^) and ^n{l,C,ip) 
the behavior of 6 and plays only a minor role. 

It is instructive to consider informally a probabilistic connection between 
the processes Cilift) and ^{'^^C^t)- Let us associate with {,^(7,c/?(),0 < t < 
1} its natural filtration {.7-t,0 < t < 1}, where each o"-field is 

and consider the filtered process {^(Tj ^t),^t, < i < 1}. This is in t a Gaus- 
sian semimartingale and it can be shown that the process {^{'y,Cipt),J^t,0 < 
i < 1} is actually its martingale part. In other words, if V denotes the 
Volterra operator defined by the integral on the right-hand side of (4.1), 
then the identity 

(4.2) i{-f,^t) = i{j,v^t)+i{i,cvt), o<t<i, 

is simply the Doob-Meyer decomposition of the process {^(7,(/3j), J^t,0 <t< 

Details of this decomposition can be found in Khmaladze (1993), where 
the general construction of this form for a function-parametric process was 
introduced and studied. The notion of Doob-Meyer decomposition for a 
semimartingale can be found, for example, in Liptser and Shiryayev (1977). 

Remark 4.1. Since Cip is orthogonal to 1 and to ipf, the equality (4.2) 
can be rewritten in terms of the process b: 

(4.3) b{j,ipt) = bi^,V^t)+bi^,Cipt). 

To some extent this is an unusual equation because both processes b{'y,ipt) 
and b{'~f,C(pt), taken separately, are Brownian motions. However, the nature 
of (4.3) can be more clearly understood as follows: let {Tt,0 < t < 1} be the 
natural filtration of the process 6(7, ipt) in t and let us enrich it with the 
cj-field cr{6(7,/i)}. Then the process {b{'y , ipt) , Tt V a{b{'y,h)},0 < t < 1} is 
a Gaussian semimartingale (and not a martingale) and (4.3) is its Doob- 
Meyer decomposition. See, for example, Liptser and Shiryayev (1989) for 
more details on this. 

Remark 4.2. Another consequence of the orthogonality of dpt to 1 
and to ipf is this: although the process C(7)V't) with ipt chosen according to 
(3.10) with a nonconstant ip is not a Brownian bridge (because in this case 
||(/?j IP < ||(/9i|p = i) and hence even the process S,n{l,Vt) with known value 
of parameter and statistics based on it may have an inconvenient limiting 
distribution, the transformed process ^{'y,Cipt) is the standard Brownian 
motion for any such choice of (ft- 



20 



E. V. KHMALADZE AND H. L. KOUL 



We shall now describe an analog of the above transformation suitable for 
testing the hypothesis Ha^ : G{y) = F{y/a) Vy € M and for some a > 0. Let 
a be an estimate of a based on {(Xj, 1^), 1 < i < n} satisfying 

(4.4) ||nV2(^_^)|| = o^(l). 

The analog of the processes £,n here is 



e„.(7,^):=n-V2^7(^i) 



i=l 



"f 



Y,-n{Xi,( 



a 



ifdF 



To transform its weak limit S,^ under Hu, again define an extended score 
function of F{{y — /i)/o") with respect to both parameters // and a, which is 
haiv) = {l,ilJf^i{y/a),il)fa{y/a)Y, where obviously 



^^-if)=^^^(f)' 



With notation 



*m!)4 






a \a 



^(t) 



y_ffy 

r2J 



t = F 



the analog of the Tt matrix is 

/ 1-t -q{t) 



a 



To-,* := 



-Qit) 



-QAt) \ 



q {s) ds 



V 



Qait) / q{s)q„{s)ds 



q{s)q^{s)ds 
ql{s)ds 



J 



Again, assume that T^i exists for all < t < 1. Then, as above, let 

(4.5) CMy)-=V^iy)- r 'f{z)hl{z)T-^p,.dF{z)K{y), y € M. 

One can show that C^ is a norm preserving transformation from L2(M,i^) 
to the subspace Tia- = {if £ -^2(1^) F) ■ (v'l ^o-) = 0} and hence .^(7, Ca-^) is a 
Brownian motion on L. 



Proof of Proposition 4.1. Though we could refer to the proof of 
Proposition 6.1, for presentational purposes it seems more convenient to 
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give it here separately. Let, within this proof only, ^(t) := ip{F~^{t)) and 
g(t) = h{F^^{t)) for < t < 1. Then 

C^{y)h{yf dF{y) 

i;{t)g{tfdt- f f ^P{s)g{sfT-Usg{t)g^{t)dt 
Jo Jo 

i;{t)g{tfdt- f i^{s)g{sfT;' f g{t)g^{t)dtds 

Jo Js 

i;{t)g{tfdt- f ij{s)g{sfds 
Jo 

= 0. 

For the technical justification of the interchange of integration in the second 
equation above see the proof of Proposition 6.1 below or Khmaladze (1993). 
Similarly, we also have 

[CipfdF= f ^P^{s)ds-2 f iP{s)g^{s)r~^ f i^{t)g{t)dsdt 

Jo Jo Js 



+ 11 ij{s)g'^{s)T-^TsytTuh{t)i;{t)dsdt 
Jo Jo 

Jif^dF 



D 



Proof of Proposition 4.2. If -0/ is a function of bounded variation 
and (^ S $, then dp £ $ and therefore we can use (2.8). Together with the 
orthogonality property dp _L ipj, which implies that {Ca, rrig) = 0, we obtain 
that 

iniCa) = U^a) + Op{l) 

and the rest follows from Proposition 4.1, the CLT for ^„(£a) and a standard 
contiguity argument. D 

5. Some explicit formulas and remarks. 

5.1. Transformation of the processes Wi and ^„(l,c/5t). In this section 
we shall apply the above transformation to residual empirical processes and 
give computational formulae of the transformed processes for testing Hq and 

Recall from the previous sections that, for < t < 1, 

n 

(5.1) m) := W,{F-\t)) = n-'/'Y.i^{em < F-'it)} - t] 
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(5.2) ^„(l,(^0 = n-'/'E 



i=l 



^{eSmieriO) < F-\t)} - I ^{y) dF{y) 

y<F-Ht) 



where in (5.2) ipt{y) = ^p{y)I{y < F ^{t)). Note that Ui{t) also corresponds 
to the ^n{'^,y^t), with ipt{y) = I{y < F~^{t)}. As another practicahy useful 
consequence of orthogonality of C(p to 1, we have the following equality: 

n 

It means that we only need to construct transformations of random sum- 
mands in (5.1) and (5.2). Introduce vector-functions 

G{z)=f T-\My)dF{y), 
Jiz)=l^^ip{y)r-l^hiy)dF{y), 



zeM. 



Then the transformation C of (4.1) applied to Ui of (5.1) gives 

n 

5ni(t)=n-V2 ^ [l{e,{e)<z}-[l,i^fie^{e))]GizAe^ie)) 



Wr, 



(5.3) 



Pi=l 



t = F(z) 



while the transformation of (5.2) is 

n 

Wn2it) = n-^/^Y.[p(^i(^)M^^(^)^^} - [hi^fieiimJizAEiie)) 



i=l 



(5.4) 

Similarly, to describe ADF tests for Hu based on the analog of wi, let 
now fi = £i(9)/a, and let us consider the processes 



n 



-1/2 



J2[m<z)-Fiz)], 



n 



-1/2 



E 



V{fi)l{h<z}- I ip{y)dF{y) 



i=l i=l 

t = F{z), zGM 

Then arguing as above, we are led to the following respective computational 
formulae: 

n 

wniit) = n-^'^Y}^{h <z}- hT{e,{e))Ga{z A ei{9))], 

i=l 
n 
Wn2it) = n-V2 ^[^(f .)I{r. <z}- hl{£iie))Mz A e,(^))], 
j=l 

t = F{z), zeR, 
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where h(j[y) is as in the previous section, while G^ and J^ are defined as 
with h replaced by ha and V replaced by To-. 

These formulae may be used in the computation of any test statistic based 
on continuous functionals of m„i, m„2- From the theory developed above, if 
these functionals are invariant under the usual time transformation t = F{y), 
they will be ADF! 

5.2. Nonrandom design. We now state some analogous facts for the case 
of a nonrandom design where now the design vectors are denoted by Xni- 
An analog of the condition (2.4) here is as follows: There exist a g-vector /i 
on M^ X and a q x q positive definite symmetric matrix S such that 

(5.5) maxi<i<„n"^/2||^(2.^.^g/)|| =o(l), 

suPl<,K„,„l/2|,^_g|l<fc?^^/^|^t(a;m,^?) - Ai(2;„i,6') - (?? - 6')^/i(j;„i,6l)| = o(l). 

Under these conditions on the regression function and the rest of the con- 
ditions as before, the analogs of the above results with //(Xj, •) replaced by 
n{xni, •) remain valid in the present case. Using the results from Koul (1996), 
it is possible to obtain the analog of the expansions (2.14) and (2.15) under 
more general conditions on the function fj, than given in (5.5), but we refrain 
from doing this for the sake of not obscuring main ideas and for the sake of 
brevity. 

A similar remark applies to the linear regression model. In particular, 
in the case of nonrandom and general designs, but having the n x p de- 
sign matrix X of rank p, just replace n~^'^Xj in the above formulas by 
(X'X)~^'^a;„i, I < i < n, everywhere. Then tests based on the analogues 
of Wni and Wn2 are ADF for Hq, provided maxi<j<„n-^'^||(X'X)~-^'^x„j|| = 
0(1). 

5.3. Autoregressive time series. Because of the close connection between 
regression and autoregressive models, analogues of the above ADF tests per- 
taining to the error distribution are easy to see in this case. Accordingly, sup- 
pose Yi, i £Z := {0, ±1, ±2, . . . }, is now an observable stationary and ergodic 
time series. Let /i be as before satisfying (1.1) with Xi := (li_i, . . . ,Yi-p)'^ , 
where p>l is a known integer. Then the above tests with this Xi will be 
again ADF for testing Hq. A rigorous proof of this claim is similar to that 
appearing above, with the proviso that one uses the ergodic theorem in place 
of the law of the large numbers, and the CLT for martingale differences in 
place of the Lindeberg-Feller CLT. Note that now H is the d.f. of the random 
vector Xq. 



24 E. V. KHMALADZE AND H. L. KOUL 

In the case of a stationary and ergodic linear AR(p) model, that is, when 
IJ,{x, 1?) = x'??, if the null error d.f. F has mean zero and finite variance, then 
EXq = 0, that is, (3.5) is automatically satisfied, and hence tests based on 
the analog of Ui of (5.1) will be a priori ADF for Hq. This was first proved 
in Boldin (1982), assuming F has bounded second derivative, and in Koul 
(1991) when F has only a uniformly continuous density. Thus, in linear 
autoregressive models the above transformation is useful only when there is 
a nonzero mean present in these models. 

6. Fitting a regression model. In this section we shall develop some tests 
based on innovation processes that will be asymptotically distribution free 
for fitting a parametric model to the regression function m{x) := E{Y\X = 
x). Actually we consider a somewhat more general problem where we fit a 
parametric model to a general regression function defined as follows. 

For a real- valued measurable function 99 on M, let J^^, denote a class of dis- 
tribution functions F on M such that if G L2{M,F) and / \'^{y + t)\ F[dy) < 
cxD for all |t| < /c < cxD. Let m^{x) be defined by the relation 

(6.1) E[ip{Y-m^{x))\X = x]=0. 

Note that if {p{y) = y, then 7n^(x) = m(x), while if ip{y) = I{y > 0} — (1 — 
a), for an < a < 1, then mip{x) is the ath quantile of the conditional 
distribution of Y, given X = x. The choice of 93 is up to the practitioner. 
The d.f. F of the error Y — ?7i^(X) will be assumed to be an unknown 
member of J^^ for a given cp. 

The problem of testing Hq is now extended to testing the hypothesis that 
Hip:m^{x) = fj,{x,9) for some 6 € Q against the alternatives described in 
(1.6). Consider again the function-parametric regression process 

n 

1=1 
Note that because of (6.1), under H^ E^ni'y,f',0) = 0. 

Let 9 be an M-estimator of 9 satisfying (2.9) corresponding to r/^ = /i,?. 
Suppose, additionally, F G J^^ is such that the function t>-^ J ip{y + t)F{dy), 
t S M, is strictly monotonic and differentiable in a neighborhood of 0. Now, 
if we consider problems where (p{y) is differentiable, such as (p{y) = y, which 
is a most interesting case, then we need to assume regularity condition (2.4) 
on the regression function //(-,'(?). While in the case of a nondifferentiable 
if, as in, for example, (p{y) = I{y > 0} — (1 — a), we need to assume as well 
that F, although unknown, satisfies also (1.2). In both cases, under (2.4) 
and (2.10) 9 satisfies (2.11) and we obtain 

In (7, f) = Uil, 9?) - (7, P'I)C9^^n{fle, f) + Op{l) 
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where 

7x(x)=7(2;)-(7,/iJ)Cg-Ve(2;), xeW, 

is the part of 7 orthogonal to fig and no transformation of ip is involved. 

We emphasize that it is only for motivational purposes we are confining 
attention here to M-estimators. As we shall see later, any n-^'^-consistent 
estimator may be used to construct ADF tests for H^. 

Now one can show that under H^, for each 7, Lp of the given type, 

(6.2) in(.J,v)^b{j±,ip), 

while under any sequence of alternatives (1-6), 

where A is either {ip',1) or —{ip^ipf) depending on whether we assume 
(2.4) and (2.5) or (1.2). 

As this last result shows, the asymptotic shift of the regression process 
£,n{l-,^) under the alternatives (1.6) is the linear functional of Ig defined 
by the function 7_|_. Therefore, to be able to detect all alternatives of the 
assumed type, we need to have a substantial supply of 7_l, that is, we need 
to consider ^n(7, V') ^^ ^ process in 7, and there is no need to vary ip just in 
the same way as we had to vary ip when testing our previous hypothesis Hq 
and keep 7 fixed. We do not try to choose in any sense "optimal" ip because 
the result will depend on F, while we prefer to work under the assumption 
that we do not know this d.f. Thus we can and will assume that ip in the 
rest of this section is fixed. 

From (6.2) we note that the limiting process as a function in 7 is again a 
projection of Brownian motion, but as a function in -y±, it is just a Brownian 
motion. 

Now we may have a convenient and customary way to parameterise 6(7, ip) 
in 7 G L2{W,H) to obtain processes with a standard and convenient distri- 
bution, and if we had similar ways to do this in subspaces of L2{MP,H), 
we could have the same convenient limiting processes in our problem. This, 
however, is not a straightforward task, as we have said earlier, especially 
because these subspaces, being orthogonal to fig, change from one regres- 
sion function to another, and may even well change for the same regression 
function along the changes in the parameter 6. 

Nevertheless, we will see below that given a "convenient" indexing class 
Qo C L2{W,H), in the sense that {6(7, 99), 7 G ^0} forms a "convenient" 
asymptotic process — say, we can find the distribution of statistics based 
on {6(7,(/3),7 £ Qq} easily, and so on — we can map it isometrically into 
the subspace of functions orthogonal to fig. Thus, we obtain the process 
{6(7, 99), 7 G G'q}, where Q'q is the image of this isometry, which on the one 
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hand has exactly the same distribution and therefore carries the same "con- 
venience" as the process {b{'y,if),^ G ^o}; and on the other hand, is the 
hmiting process for ^n{7, '■p) if we index it by 7 G ^g- 

To achieve this goal, first introduce the so called scanning family of mea- 
surable subsets A := {Az : z S M} of M^ such that A^ C A^', for all z < z' , 
H{A^oo) = 0, H{Aoo) = 1, and H{Az) is a strictly increasing absolutely con- 
tinuous function of z € M. 

To give examples, let X^ denote the jth coordinate of the p-dimensional 
design variable X, j = 1,. . . ,p. Suppose that the marginal distribution of X^ 
is absolutely continuous. Then we can take the family A^ = {x £W '.x^ < z} 

as a scanning family. Or, if the sum X^ -\ h X^ is absolutely continuous, 

then one can take the family of half spaces A^ = {x (i W : x^ + x'^ + ■ ■ ■ + x^ < 
z}. 

Now let B^ denote the complement of the set B, 

z{x) : = mi{z : Az 3 x} , 

Ja- 

%l{x) = j -f{y)fi^iy)C;^l^ dH{y) fi^{x), x E M^, ^ G 9. 

We shall often write Cz, T for Cq^z, %■, respectively. Now, define the operator 

JC-f{x):=-i{x)-T-f{x), xGE^. 

Proposition 6.1. Let ^ := {7 e L2{W,H) : {j^Le) = 0}. Assume Cz is 
nonsingular for all —00 < z < 00. Then the transformation IC is a norm 
preserving transformation from L2{W,H) to Q: 

ICj±fie, 11^711 = ll7ll- 

Consequently, for any fixed if, the process ^(7, if) = C(^7i V') is (function 
parametric) Brownian motion in 7. 

Similarly to Proposition 4.2, the following corollary shows that much less 
is required from an estimator than its asymptotic linearity. The random 
vector Z below can be thought of as the limit in distribution of ^/n{9 — 6). 

Corollary 6.1. Let ^ he any process of the form 

1(7, 99) = 6(7, 99)- (7, /i J) Z, 

where Z is a random vector (not necessarily Gaussian) in Mfl . Then for any 
fixed ip £ L2(M, F), the process 

is Brownian motion in j £ L2{W , LI) . 
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Now we shall, as an example, focus on the case 7 = I^, for B a Borel set 
in W. Then 

JCIb{x)=Ib{x)- f lB{y)fFe{y)C~ly)dH{y) fieix). 
In view of the above discussion, our transformation is the process 

Wn{B) :=|„(Ob,(/9) 



-1/2 
(6.3) 



II, - n 



1=1'- "^^{Xi) 

Xip{Y,-f,{X,,9)). 



We do not consider in this paper the problem of weak convergence of trans- 
formed processes {S,n{'y , J--^) , ^ S $0} 01 {^„(/C7,(^),7 € Gq} to correspond- 
ing Brownian motions for appropriate indexing classes ^q and Qq in full gen- 
erality. Nevertheless we shall now state a sufficient condition under which 
the process (6.3) converges weakly to a set-parametric Brownian motion 
on the practically important class of sets — a subclass Bq of all right closed 
rectangles in W, that is, Bq C {(—00,?;], v E M*'}. Our assumption is the 
following: 

(6.4) There exists a r > such that B C Ai^r for all B eBo- 

This condition is not necessary, but simplifies the proof substantially. See 
Khmaladze (1993) for the version without this condition. 

Let {w{B),B € Bq} be set-parametric Brownian motion on Bq with co- 
variance function 

Ew{B)w{B') = cH{B n B'), 

where, without loss of generality we can assume the constant c to be 1; 
compare Remark 6.1. 

The space in which we will consider weak convergence of Wn will be 
£°°(^q), where Qq = {Ib{-)-,B G Bq} is equipped with the L2-norm. [See, e.g., 
page 34 in van der Vaart and Wellner (1996).] Now, write fj, Ej for ej(^), 
ei{6), respectively. Also, let e denote a r.v. having the same distribution as 
ei{e). 

Proposition 6.2. Suppose regularity conditions (2.4) and (2.5) are sat- 
isfied. Suppose also Eip'^{e) = 1 and 9 is any estimator such that y/n{9 — 9) = 
Op{l). If Bq is such that (6.4) is satisfied then, under H^ 

Wn^w, in /°°(^o)- 
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Remark 6.1. In the definition (6.3) of tlie process Wn we assumed that 
Eip^{e) = 1 without loss of generality. Indeed, we can always replace <f{si) by 
if{ii)/a in Wn, where a^ = n~^ J27=i v'^i^i) is an estimator of cr^ = Eip'^{e). 
Then it is obvious that the processes which incorporate ip{ii)/a and ip{ii)/a, 
respectively, will converge to each other, uniformly in B, in probability. 

Since the kernel of the transformation T depends on 9, we will certainly 
need to replace it with an estimator. It seems the simplest to use the same 
estimator 9 as is used in ^n, although it is not necessary and in principle 
any consistent estimator can be used: small perturbation of ^ in Tg will only 
slightly perturb the process ^n{Tel ■, '■p) ■ To prove this latter statement for- 
mally, we need to complement (2.4) by the following two mild assumptions. 

Let 



d^(^i, ^2) := i^IlM {X) - A^, {X)rEv'{i), 
p{6):= sup d{'di,'d2), 

||l?l-1?2||<<5 

Suppose that Eip'^{e) = 1, and that for some e > 0, 



i:=e{e), !}£&, 
<5>0. 



(6.5) 



sup 

|ji5i-i92||<e 



1 " 

1=1 



Op(l)> 



as n- 



00, 



(6.6) 



Y.kp{eT 



< CXD. 



fc=0 



Define the estimated tranformed process: 

Wn{B) := |„(Ib, if) - in{TglB,ip). 

We have the following statement. 

Proposition 6.3. Let {Ib,B G Bq} be any collection of indicator func- 
tions such that Bo satisfies (6.4). Then under the assumptions (6.5) and 
(6.6), 

sup \Wn{B) - WniB)\ = Op{l). 
B&Bo 

To prove this last proposition we will use the following lemma, which is 
of independent interest. Let, for a c> 0, 

f 1 " 

D„ = l sup -Y,\\fi^,{X,)-fi^,{X,)fip\ii)<{l + c)d\6) 



Jjl?l-1?2||<5 ^ 



1=1 



for all < (5 < c 
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Lemma 6.1. Let {Ia,^ & A'} be any collection of indicator functions. 
Then under the assumptions (6.5) and (6.6), 



P[ sup |^„(lA/itf,(/') -Cn(IlAAe,V?)| >a;|D„ 
V||^-0|1<. 

<exp\-{x/2)cf2kpie2~')\, 
I fc=0 J 

E\ sup \inO^AiJ'^,f)-inO^A/J'e,'P)\'^\Dn\ 
l|ji9-6l|j<e J 

oo 
fc=0 

as e ^ 0, where C is a positive universal constant. 
Now we prove all three propositions and the lemma. 



Proof of Proposition 6.1. Fix a k < oo and consider 7^ :=JIA^.. 
We shall first show that (/C7fc,/iJ) = 0. Note that y E A^m is equivalent 
to X G A"^, s for almost all x, y with respect to the measure H. This fact, 
together with changing the order of integration, yields 



^kix)fi]){x)dH{x) 



-^z{x) 



lk{y)fi^{y)C^^y^ dH{y)fie{x)fiJ{x) dH{x) 

I 

{lk,fj'I)- -fk{y)pJ{y)C^^l)dH{y) I fie{x)fiJ{x)dH{x) 



= {lk,fj'J)-{lk,f^J) 
= 0. 

Now we shall show that 

(6.7) (/C7fe,/C7fc) = (7fc,7fc). 

Using the notation 

pI{z) := f lk{y)fili{y)C~ly) dH{y), z E 1, 
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rewrite 

= (7fc,7fc) - 2 / pl{z{x))fig{x)'y{x)dH{x) 
+ / pl{z{x))fie{x)fj^{x)p{z{x))dH{x) 



ilk, Ik) - 2 / Pk {z)C, dpk{z) + pf. {z) dC^pkiz) 



{lk,lk) - pl{z)CzPk{z)\ 



oo 
— oo' 



Because 7^ = ^^Ai, ■, the function p^ remains bounded as z — > 00 and hence 
the substitution in the above equals zero, thereby proving (6.7). 

Next, by definition 7^ — > 7 as A; — > 00. Let A; — > 00 in (6.7) to conclude that 
it remains true for a general 7 G L2(M^, H). D 

Proof of Proposition 6.2. Using the definition of the operator K, 
one can write 

sup \Wn{B) - in{KlB,^)\ 
BeBo 

< sup \in{lB,ip)-Cn{lB,^)-Eip'EilBpJ)n^/\9-9)\ 

B£Bo 

+ sup \in{TiB,^)-UTiB,^) + Eip'E{iBi:ijy/\e-e)\. 

B£Bo 

However, Proposition 2.1 implies that the first supremum on the right-hand 
side is Op(l). To deal with the second supremum, let us use the fact that 
Ia,(^) (y) = 1^'= {x) a.e. and change the order of summation and integration: 

n „ 



i=l- 



l{y)l^I iy)Czil)^ri{^Ai{y)fie, ^) dH{y). 



Similar equality is certainly true for ^„,. Therefore, using Proposition 2.1 
once again, we obtain 

sup \UriB,ip)-WTiB,^)-E^'E{iBi:i];y/\e-e)\=op{i). 

B&Bo '-' 

Proof of Proposition 6.3. First note that from the previous proof 
we have 

Wn{B) - Wn{B) = in{TdB,^) " fn C^elfi , 9?) • 
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Now let 



and 



Then we can rewrite 

= J lB{y)f{y, ^)ln{z{y),'9) dH{y). 
Since 

<h\w-0\\<e\ sup \in{T^lB,'p)-L(.'^e^B,v)\ 
^" "- ' ||i?-e|l<£ 

+ \\\e~e\\>e}\ini'T'QlB,V) - ini%^B,V^)\ 

and 6 is consistent estimator, it is enough to prove that 

sup flB{y)\v'^{y,^)Uz{y),'&)-v''iy,0)L{z{y),e)\dH{y) 

B£BoM-e\\<eJ 

as e — > and Ji — > 00. Using the Cauchy-Schwarz inequahty and the fact that 
B C ^i-T, we find that the left-hand side of the above equality is bounded 
above by 

sup U.,^)-rj{;9)\\H\\U;mH 

\\^-e\\<e 

+ U;9)\\h sup |||„(.,^)-|„(.,e)||^, 

\\^~e\\<e 

where || • \\h is the L2 norm with respect to H. 

Since Cz is nonsingular for z < 1 — t, we have ||??(-,^)||h < c>o. More- 
over, /i^ being continuous in t9 in mean square sense [condition (2.4)], it 
follows that for all sufficiently small e, C^^z is nonsingular for all {I'd — 
d\\ < e, z < 1 — T, and that snpi\^_0u^^\\T]{-,'&) — r]{-,6)\\H is small. What 
remains therefore to show is that sup^gg^^ |j^_g||<^ ll^nl-,"!?)!!// = Op{l), and 
that supB(^Bo,\\^~e\\<eUn{-,'&) - in{-,0)\\ = Op{l) as n ^ oo and e^O. These 
properties are proved in Lemma 6.1. D 
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Proof of Lemma 6.1. First note that a symmetrization lemma [see, 
e.g., van der Vaart and Wellner (1996), Section 2.3.2] can be used to imply 
that 



where 



||fn(^,^l)-en(^,^?2)||<2||e°(z,^l)-e°(^,^?2 



i=l 

and {ei}^^i are Rademacher random variables independent of {{Xi,Yi)}^^^. 
Averaging first over {cj}"^]^, we obtain for all t > 0, 



E[exp{t~'\\C{z,^i)-aiz,^2)\\}\Dn 



<E 



exp<^2i ^n ^^\\^i^^{Xi) - ii^^{Xi)\\ V9^(ei) 

2/1 , „\„2/ 



Dr. 



<exp{2r^(l + c)p'(||i9i-i92||)}. 

Following van der Vaart and Wellner (1996), Section 2.2, denote by ||^||^,d„ 
the Orlicz norm of the random variable X induced by the function ■0(x) = 
e^ — 1 — this is the smallest constant t such that £'[exp(|X|/t) — lli^n] ^ 1- 
Then the previous inequality implies that 

iia(z,^i) - e°(2,^2)|lv.,D„ < Cpdl^i - ^2||). 

Since e^'* — 1 > (x/t)^/2!, it immediately follows that 

E sup \\L{y,^)-L{y,o)f 

M-e\\<e 



<2 



sup ||^n(2/,^)-Cn(y,' 

&-e\\<e 



ip,D„ 



We now show that the Orlicz norm on the right-hand side is small for small 
e. We will do this by slightly adjusting the chaining argument. Let N{5) be 
the covering number [the cardinality of the minimal 5-net M{S)] of the unit 
ball in M?. Let each Cfc+i eM{2-''-^) be linked to unique Cfc gAA(2-'=) in 
such a way that ||Ca:+i — Ck\\ < 2~^. Then using the Fundamental Lemma 2.2 
of van der Vaart and Wellner (1996), Section 2.2, one can write (with ■& = eC) 



max Un{y,^k)-^n{y,^k+i)\\ 



V',-D„ 



<ClniV(2~'=)p(e2-^'). 
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Hence 



sup |||0(y,^)-eO(y,( 

|-i?-6l||<e 



■>P,D„ 



(6i 



max \\^n{y,'&k)-^n{y,'&k+i] 



<E 

k=l 

c 



k\ 



k=l 



k=l 



where the last inequality follows from obvious estimation from above, N{6) < 
C5~'^. Since p{e2~ ) ^ as e ^ and the series converges for some e > 0, it 
tends to as e ^^ 0. 
Finally, combine the symmetrization and Markov inequalities to obtain 



p( sup \\Uy,^)-Uy,o)\\>x\Dn 

||i?-6l||<e 

<p( sup ||e°(y,^)-^~°(y,^)||> 
|k-e||<e 



X 



Dr 



<E 



exp^t-1 sup ||eO(y,T?)-e°(y,e)|| 

il^-e|!<e 



Dr. 



expl-^ 



From the definition of the Orlicz norm || sup||^_g||<^ \\in{y^'^)~in{y^^)\\ \\i>,Dn 
and the inequality (6.8), it follows that the expectation above does not 
exceed 2 for t = qYl,'k'=i kp{e2~ ). Hence the inequality of the lemma. D 

We end this section by pointing out that the conditions (6.5) and (6.6) 
are trivially satisfied in the case p,{x,'d) ='0'S{x), where S{x) is a vector of 
functions of x with finite second moment i?||5'(X)|p. 

7. Some simulations. This section presents some simulations to see how 
well the finite sample level of significance is approximated by the asymptotic 
level for the supremum of the absolute values of the transformed processes 
defined at (5.3) and (6.3). It is noted that when fitting a standard normal 
distribution to the errors with a rapidly changing regression function, or 
when fitting a two-variable linear regression model with standard normal 
errors and using the least squares residuals, this approximation is very good 
even for the sample size 40, especially in the right tail. 

The lack of an analytical form of the distribution of the supremum of 
the Brownian motion on [0, 1]^ created an extra difficulty here. We had to 
first obtain simulated approximation to this distribution. This was done by 
simulating an appropriate two time parameter Poisson process of sample size 
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5K, with 20K replications. Selected quantiles based on this simulation are 
presented in Tables 2 of Section 7.2. This should be of independent interest 
also. 

7.1. sup^ |m„i(2;)| o/ (5.3). This section presents some selected empirical 
percentiles of the transformed statistic Dn :=sup^ |w„i(2;)| of (5.3) for test- 
ing Hq : F is the standard normal d.f. The regression function is taken to be 
li{x,'d) = e"^^, with true 9 = 0.25, the regressors Xi,i = 1, . . . ,re, are chosen 
to be uniformly distributed on [2,4], and the errors Si = ei{9),i = 1, . . . ,n, 
are standard Gaussian. In this case the T function of Section 5.3 becomes 

T.-1 ^ I f'i^ + ya{y) a{y)' 

ny) [i-F{y)][ya{y) + l-a^{y)]\ a{y) 1 

where a{y) = f{y)/{l — F{y)), with / and F denoting the standard normal 
density and d.f., respectively. Consequently, the vector-function G of (5.3) 
is now equal to 



G^iz)=j Jl,-y)r-y{y)dy 
1 



f \ , t 2rT(^'"(2/)-y)«(y)c^?/ 

and, eventually, the transformed process of (5.3) has the form 



n 



Wni{t)=n-'/^Yl 



(7.1) 



=1 '- 



Hei{e)<z} 

A-«(^) i + ei{e){a{y)-y) 
X) ya{y) + l-a'^{y) 



a{y) dy 



t = F(z). 



Although the form of the regression function does not participate in the 
martingale transformation C it still may affect the finite sample behavior 
of the transformed process as far as it affects ei{6),i = I, . . . ,n, where 6 is 
the MLE under the null hypothesis. It was thus of interest to see whether 
the estimation of 9 will not affect the values of £i{9), i = l, . . . ,n, too much 
and worsen the convergence of the transformed process to its limit. For this 
reason we chose a more or less rapidly changing regression function. On the 
other hand there was no point in choosing multidimensional regressors Xi 
here, since the transformed process depends solely on £i{9),i = 1, . . . ,n. 

We simulated {{Xi,Yi = e°-^^"^» +£i), 1 < i < n} for sample sizes n = 40, 100 
and for each sample calculated the value of the Kolmogorov-Smirnov statis- 
tic Dn := sup{|w„i(t)|;0 < i < 1}, with iVni{t) as in (7.1). This was done 
m = lOK times. In Table 1 da is the 100(1 — a)% percentile of the limiting 
distribution of Z)„. The values are obtained by approximating the d.f. of the 
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Table 1 
Selected quantiles of P{Dn > da) 



a 


0.2 


0.1 


0.05 


0.025 


0.01 


n \ da 


1.64 


1.96 


2.24 


2.50 


2.81 


40 
100 


0.168 
0.178 


0.084 
0.093 


0.046 
0.052 


0.029 
0.029 


0.019 
0.014 



supremum of the Brownian motion over [0, 1] by Q{z) := P(supo<i<i \S,nit) — 
nt\l \fn < z), with n = 5K, where Cn{t), t G [0, 1], is a Poisson process with 
intensity n. The d.f. Q was calculated using the exact recurrence formulas 
and code given in Khmaladze and Shinjikashvili (2001). The values obtained 
are accurate to 5 • 10"^. 

Table 1 also gives the Monte Carlo estimates of P{Dn > da) for n = 
40 and n = 100 based on m = WK replications. The resulting (simulated) 
distribution functions of Dn along with Q as solid line are shown in Figure 1 . 
The quality of approximation appears to be quite close to what one has in 
the classical case of the empirical process and the limiting Brownian bridge 
especially in the upper tail, where we need it the most. 

7.2. sup^ \wniB)\ of (6.3). Here the regressors Xi, i = 1, . . . ,n, are two- 
dimensional Gaussian random vectors with standard normal marginal dis- 
tributions and correlation r. The regression function being fitted is chosen 
to be linear: 



(7.2) 



H{x,'d) = 'i?ia;i +i?2a;2, 













^f^"""'"^ 










f 








^^ 


/ 


--- n = 40 








f / 




■■■■ n = 100 








f 1 




— n = ~ 






( 1 


/ 








(/ 










^ 










1 




1 


1 







Fig. 1. E.d.j. of D„ for n = 40, 100, m = lOK and Q. 
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with the true parameter 9' = (1, 1), while the scanning family A = {Az : z £ 
M} is just one of the examples mentioned in Section 6: A^ = {x £ M.'^ : x^ < z}. 
Let Wn, wh be as in Section 6. 

For the above regression function and the scanning family the matrix C^^ 
has the form 

^-1 ^ 1 / r^ -r\ ^ 1 ({za{z) + l)-^ 



[l-r'^][l-F{z)]\-r I J ' l-F{z)\ 

and the integral in (6.3) becomes 

J-oo ya(y) + l Vyl-r^/ 7-oo V 1 - r^ 

Here, as above, / and F denote the standard normal density and distribution 
function, respectively. In our simulations the class of sets B was chosen 
to be (— oo,x], X € M^. Write Wnix), wh{x) for WniB), wh{B) whenever 
B = (— oo,x], x G M^, respectively. Choosing Lp{y) = y and 6 to be the least 
squares estimator, the transformed process (6.3) becomes 

»„(.) = «-"' t kx, <.) - r"" ^f^F(:^l=i;|) d!,x„ 

~^L J-oo ya{y) + l Vvi-r^/ 



"'(y) n ^dy {Xi2 - rXii] 



Vl-r 



2 



x{Y,-fi{Xi,9)). 



Let V„ := sup^, \'Wnix)\,VH '■= sup^ \wHix)\. 

In order to demonstrate how well the null distribution of Vn is approxi- 
mated by the distribution of Vh, we had to first understand the form of the 
latter distribution. We thus first obtained an approximation for the distri- 
bution of this r.v. as follows. 

Let H{xi,X2',r), x = (xi,a;2)' € M?, denote the d.f. of the bivariate normal 
distribution with standard marginals and correlation r. Let 

(7.3) Hr{s,t):=H{F-\s),F-\t);r), < s, t < 1, 

be the corresponding copula function, and let w{s,t) := WHiF~^{s),F~^(t)). 
The d.f. P(supQ<;^ ^^i \w{s,t)\ < v) is the limit as n tends to infinity of, and 
is approximated by. 



Lr{v):=P[ sup \^nHr{s,t)-nHr{s,t)\/^/n<v 
\0<s,t<l 

where ^nHris^i) is a Poisson process on [0, 1]^ with expected value nHr{s,t). 
Table 2 gives the simulated values of these probabilities forr = — 0.5, 0, 0.5 
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Table 2 
Selected values of (v,Lr(v)) 



(a) for r - 


= -0.5 


























X 




0.71 


0.88 


1.00 


1.25 


1.50 


1.75 


2.00 


2.25 


2.50 


2.75 


3.00 


3.25 


Lr{x) 




0.00 


0.01 


0.05 


0.25 


0.50 


0.69 


0.82 


0.91 


0.95 


0.98 


0.99 


0.995 


(b) for r - 


= 


























X 




0.66 


0.84 


1.00 


1.25 


1.50 


1.75 


2.00 


2.25 


2.50 


2.75 


3.00 


3.25 


Lr{x) 




0.00 


0.01 


0.07 


0.30 


0.53 


0.72 


0.84 


0.91 


0.95 


0.98 


0.99 


0.995 


(c) for r = 


= 0.5 


























X 




0.59 


0.79 


1.00 


1.25 


1.50 


1.75 


2.00 


2.25 


2.50 


2.75 


3.00 


3.25 


Lr{x) 




0.00 


0.01 


0.11 


0.35 


0.57 


0.74 


0.85 


0.92 


0.96 


0.98 


0.99 


0.996 



and n = 5K, with m = 20K replications. This table is based on the tables 
and graphs of the distribution function Lr and percentile points, prepared 
by Dr. R. Brownrigg, available at www.nics.vuw.ac.nz/~ray/Brownian. 

Although the distribution of Vh depends on the copula function H^, the 
first useful observation is that relatively sharp changes in H^ do not appear 
to change the distribution of this r.v. by much. Table 3 summarizes a few 
selected percentiles to readily assess the effect of r on them. It contains the 
values of Va defined by the relation 1 — Lr{va) = ct. One readily sees that 
these values are very stable across the three chosen values of r, especially 
for a<0.1. 

We illustrate the closeness of the distribution of Vn for finite n to the 
limiting distribution with the graphs of e.d.f.s for n = 40, 100, with m = lOK 
replications. Figure 2 shows the (simulated) d.f.'s of Vn for n = 40, 100; 
m = lOK, and the approximating d.f. Lr (solid line) for Hr as in (7.3) with 
r = —0.5, 0, and 0.5. One readily notes the remarkable closeness of these 
d.f.'s, especially in the right tail. 

Table 4 gives the simulated values of P{Vn > Va) for several values of a 
and sample sizes n = 40 and n = 100, based on m = lOK replications. From 

Table 3 
Selected values of Va for r — —0.5, 0, 0.5 



r \a 


0.5 


0.25 


0.20 


0.10 


0.05 


0.025 


0.01 


-0.5 


1.50 


1.86 


1.95 


2.23 


2.50 


2.74 


3.03 


0.0 


1.46 


1.81 


1.91 


2.21 


2.46 


2.70 


3.03 


0.5 


1.42 


1.77 


1.88 


2.17 


2.43 


2.70 


2.98 
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(a) 








1 


■ '/ 






1 ■' 1 


--■ n-40 
n-100 
— n = «> 




/■' / 












Fig. 2. (a) E.d.f. o/V„, n = 40, 100, withm=lQK, and d.f. Lr, r = -0.5. (b) E.d.f. 
ofV„, n = 40, 100, with m = 10K, and d.f. U, r = Q. (c) E.d.f. ofV„, n = 40, 100, with 
m = lOA", and d.f. Lr, r = 0.5. 
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Table 4 



n 


r\a 


0.2 


0.1 


0.05 


0.01 


40 


-0.5 


0.166 


0.084 


0.045 


0.012 




0.0 


0.166 


0.085 


0.045 


0.011 




0.5 


0.162 


0.084 


0.042 


0.008 


100 


-0.5 


0.179 


0.092 


0.046 


0.009 




0.0 


0.183 


0.092 


0.048 


0.008 




0.5 


0.178 


0.093 


0.046 


0.009 



this table one also sees that the large sample approximation is reasonably 
good for even the sample size of 40 and fairly stable across the chosen values 
of r. 
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